Detailed explantions on data results

Project overview

This project gives an overview of buoy motion and statisics over the time period...

Projection used for plotting and calculations.

The position of the buoys, sea sectors and sea ice have been converted to PROJ+4 projection. This projection reulsted in the most accurate distance and bearing calculations as well as positional plotting. The projection closely resemebles EPSG:3413, but with different longitudal alignment. The parameters `+x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +type=crs` can be found at https://nsidc.org/data/user-resources/help-center/guide-nsidcs-polar-stereographic-projection.

Haversine Formula

The haversine formula gives the great–circle distance $d$ between two points $(\phi_1,\lambda_1)$ and $(\phi_2,\lambda_2)$ on a sphere of radius $R$: \[ d = 2R \, \arcsin\!\Bigg( \sqrt{\, \sin^2\!\Big(\tfrac{\phi_2-\phi_1}{2}\Big) + \cos\phi_1\,\cos\phi_2\,\sin^2\!\Big(\tfrac{\lambda_2-\lambda_1}{2}\Big)}\, \Bigg). \] Here $\phi$ denotes latitude and $\lambda$ longitude (in radians).
**Notes.**
(1) Ensure all angles are converted to radians before applying the trigonometric functions: $\theta_\mathrm{rad} = \theta_\mathrm{deg}\,\pi/180$.
(2) $R$ is typically the Earth's mean radius, $R \approx 6371\,\text{km}$.

Distance Caluclation

For the distance to be calculated on a particular day, the must be at least 20 hours elapsed from the previous day's observation time.

Buoy & SAR Drift Arctic Wide

Analyze and present the results of the SAR buoy ice motion compared to the in-situ buoy observations. Instructions provided by Chris Jackson

1) 900121_HourlyVel_Sept2020-Mar2025.xlsx This is the bible. A set of velocity/displacement results derived from the IABP buoy data. The results are for 1-Day ice motions reported at the buoy reporting frequency (in this case every 30 min). It contains displacements and velocities from both Lon/Latand Polar Stereo format.

2) 900121E_Results_AllCorrelations_0050000m_0000500m.txt. The output of the SAR Ice motions program. It's in the same format as the the gfilter files. Except these files have not in fact been filtered. Any line where the Maxcorr2 is < Maxcorr1 should be considered a bad retrieval. The starting Lon/Lat is the buoy location at the time closest to the SAR image time. You might want to recompute the displacement via the Haversine formula.

3) 900121_IMS_Ice_Distance_Good.txt. These results should be paired with the results in 900121E_Results_AllCorrelations_0050000m_0000500m.txt and should be looked at to determine if there is a correlation between bad SAR retrievals and the distance to the ice edge.

4) SAR_IceMotion_IABPMatches_900121E.csv. More of a FYI - this was the total number of possible SAR/Buoy overpass pairs. Only about 1100 of these (2600+) resulted in a successful (but possibly invalid) SAR result.

5) 900121_CorrPNG.zip (link sent seperately). These are the visual outputs of the results contained in 900121E_Results_AllCorrelations_0050000m_0000500m.txt. Its more just FYI - it shows the two images being correlated, a the correlation image and the shifted result. You should be able see the correct shift.

NOTE: Daily drift for ice and buoys is calculated, not taken from the source data. This ensures consistency across the observations. The starting and ending lon/lat for the day will be used. The values are cast to EPSG:3413. Distances calculated from first lon/lat readings for each observation

Bad Retireval Correlation

Bad retrieveal is defined where MaxCorr2 < MaxCorr1. There might be an error in the satellite image measurements. Here, a correlation is found between the bad retrieval value (`Y`: 1; `N`: 0) and the `dist_to_water` value. Values close to zero indivate no correlation while 1 indicates perfect correlation. Negative values mean the relationship is oppositely correlated.

Buoy Life Drift with SAR Measurements

All of the data for a buoy is read along with available SAR drift measurements. The closest geographical buoy is marked to the closest SAR measurement. The timeframe of the buoy also is aligned with the timeframe of the SAR drift measurements. Typically a 24-hour period but no always at the same start time. The SAR measurments or made sometimes of the same satellite but more often from two different satellites. For each buoy position, the code takes the start lon/lat as point of comparison to other points; the end date and times are not used.

Buoy Life Drift and SAR Measurements Geopackage (Overview)

Entire track of the buoy and all SAR measurement tracks for each 24-hour period can be downloaded and view in GIS. The package also contains coastlines at 10m resolution.

Buoy Life Drift and SAR Measurements Geropackage (Detail)

Entire track of the buoy and SAR measurement tracks by satellite pairing for each 24-hour period can be downloaded and view in GIS. The package also contains coastlines at 10m resolution.

Buoy life descriptive stats

The information displayed shows the displacement of a specific buoy over a defined time frame. The SAR measurements also span that same time frame. Correlation for bearing uses degrees converted to radians. See `Circular correlation (Jammalamadaka-Sarma, 2001)`. Distances computed use the `pyproj.GEOD.inv` method, which is very similar to Haversine under the hood. Buoy and SAR distances come from local computations and not distances calculated in the supplied data.

Buoy life scatter plots

The scatter plots show a breakdown of bearings and distances between buoys and SAR drift per season. Regression has been calculated per season. The `y=x` line shown indicates perfect correlation for reference. Seasons defined as:
Winter ⮕ 12/21 to 3/20
Spring ⮕ 3/21 to 6/20
Summer ⮕ 6/21 to 9/20
Fall ⮕ 9/21 to 12/20

RANSAC

The RANSAC algorithm splits the data into `inliers` and `outliers`. The regression line is calculated on the inliers. Since each season has some outliers, which could due to misreadings or other errata, the RANSAC can provide a truer relationship. More information found here: https://scikit-learn.org/stable/auto_examples/linear_model/plot_ransac.html#sphx-glr-auto-examples-linear-model-plot-ransac-py

Buoy life bearing plots

The plots use two types of measurements: angles in degrees and radians. The choice of 18° resembles N, NNE, NE, ENE, E,... etc. Bearings within 18° can be considered heading in the same direction. The correlation heatmap shows each SAR observation grouped by satellite and season, which is then correalted to the appropriate series of buoy observations. Correllation `1` indicates a perfect correlation while `-1` indicates perfect opposite correlation.

Buoy life distribution and frequency

Each seasonal distance for SAR and buoy were grouped into bins of 1 km. Each row indicates a specific buoy. WE see trendss of different distances over the seasons. Winter and early spring show little movement becausee of thick sea-ice pack. As the climate warms, the ice melts and movement increases. That is why we see less of a right-skewed dsitribution late spring through early fall. The colder months show dense clusters around lower distances (km per 24 hours).

Circular correlation (Jammalamadaka–Sarma, 2001)

Given angle pairs $(\alpha_i,\,\beta_i)$ for $i=1,\dots,n$, let the mean directions be \[ \bar{\alpha}=\operatorname{atan2}\!\Big(\sum_{i=1}^n\sin\alpha_i,\;\sum_{i=1}^n\cos\alpha_i\Big),\quad \bar{\beta}=\operatorname{atan2}\!\Big(\sum_{i=1}^n\sin\beta_i,\;\sum_{i=1}^n\cos\beta_i\Big). \] Then the circular correlation coefficient is \[ \rho_c=\frac{\sum_{i=1}^n\sin(\alpha_i-\bar{\alpha})\,\sin(\beta_i-\bar{\beta})}{\sqrt{\Big(\sum_{i=1}^n\sin^2(\alpha_i-\bar{\alpha})\Big)\,\Big(\sum_{i=1}^n\sin^2(\beta_i-\bar{\beta})\Big)}}\,. \] **Notes.**
(1) Angles must be in radians inside the trig functions. Degrees converted via $\theta_\mathrm{rad}=\theta_\mathrm{deg}\,\pi/180$.
(2) $\rho_c\in[-1,1]$ analogous to Pearson's $r$.

Accuracy

Accuracy measures the overall fraction of correctly classified pixels, considering both ice and open water. It is defined as the ratio of true positives and true negatives to all evaluated pixels: \[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}. \] Here, $TP$ is the number of true positives (both datasets report ice), $TN$ is the number of true negatives (both report open water), $FP$ is the number of false positives (dataset 1 reports ice while dataset 2 reports open water), and $FN$ is the number of false negatives (dataset 1 reports open water while dataset 2 reports ice).

Precision (Positive Predictive Value)

Precision quantifies how reliable the ice detections are in dataset 1. It answers: of all pixels labeled as ice by dataset 1, what fraction are also ice in dataset 2? \[ \text{Precision} = \frac{TP}{TP + FP}. \] High precision means there are few false-ice detections (low $FP$). When $TP + FP = 0$, precision is undefined and reported as NaN.

Sensitivity (True Positive Rate)

Recall, also known as sensitivity or true positive rate, measures how well dataset 1 recovers the ice pixels identified by dataset 2. It answers: of all true ice pixels in dataset 2, what fraction are correctly labeled as ice by dataset 1? \[ \text{Recall} = \frac{TP}{TP + FN}. \] High recall means few ice pixels are missed (low $FN$). When $TP + FN = 0$, recall is undefined and reported as NaN.

Specificity (True Negative Rate)

Specificity, or true negative rate, measures how well dataset 1 correctly identifies open water. It answers: of all open-water pixels in dataset 2, what fraction are also labeled as open water by dataset 1? \[ \text{Specificity} = \frac{TN}{TN + FP}. \] High specificity means few water pixels are incorrectly flagged as ice (low $FP$). When $TN + FP = 0$, specificity is undefined and reported as NaN.

F1 denominator (precision + recall)

The intermediate variable \texttt{f1\_den} is simply the sum of precision and recall: \[ \text{f1\_den} = \text{Precision} + \text{Recall}. \] It appears in the denominator of the F1 score. When this sum is zero or not finite, the F1 score cannot be computed and is reported as NaN.

F1 Score

The F1 score is the harmonic mean of precision and recall, providing a single measure that balances both false positives and false negatives: \[ F_1 = \frac{2\,\text{Precision}\,\text{Recall}}{\text{Precision} + \text{Recall}}. \] The F1 score is high only when both precision and recall are high. It is especially useful when the classes (ice vs open water) are imbalanced. When $\text{Precision} + \text{Recall} = 0$, the F1 score is undefined and reported as NaN.

Balanced Accuracy

Balanced accuracy averages the performance on the positive class (ice) and the negative class (open water) by combining recall (sensitivity) and specificity. It is defined as \[ \text{Balanced Accuracy} = \frac{\text{Recall} + \text{Specificity}}{2}. \] This metric is particularly useful when the number of ice and open-water pixels is highly imbalanced, preventing an over-optimistic assessment that can occur when one class dominates.